Statistical learning methods as a preprocessing step for survival analysis: evaluation of concept using lung cancer data
نویسندگان
چکیده
BACKGROUND Statistical learning (SL) techniques can address non-linear relationships and small datasets but do not provide an output that has an epidemiologic interpretation. METHODS A small set of clinical variables (CVs) for stage-1 non-small cell lung cancer patients was used to evaluate an approach for using SL methods as a preprocessing step for survival analysis. A stochastic method of training a probabilistic neural network (PNN) was used with differential evolution (DE) optimization. Survival scores were derived stochastically by combining CVs with the PNN. Patients (n = 151) were dichotomized into favorable (n = 92) and unfavorable (n = 59) survival outcome groups. These PNN derived scores were used with logistic regression (LR) modeling to predict favorable survival outcome and were integrated into the survival analysis (i.e. Kaplan-Meier analysis and Cox regression). The hybrid modeling was compared with the respective modeling using raw CVs. The area under the receiver operating characteristic curve (Az) was used to compare model predictive capability. Odds ratios (ORs) and hazard ratios (HRs) were used to compare disease associations with 95% confidence intervals (CIs). RESULTS The LR model with the best predictive capability gave Az = 0.703. While controlling for gender and tumor grade, the OR = 0.63 (CI: 0.43, 0.91) per standard deviation (SD) increase in age indicates increasing age confers unfavorable outcome. The hybrid LR model gave Az = 0.778 by combining age and tumor grade with the PNN and controlling for gender. The PNN score and age translate inversely with respect to risk. The OR = 0.27 (CI: 0.14, 0.53) per SD increase in PNN score indicates those patients with decreased score confer unfavorable outcome. The tumor grade adjusted hazard for patients above the median age compared with those below the median was HR = 1.78 (CI: 1.06, 3.02), whereas the hazard for those patients below the median PNN score compared to those above the median was HR = 4.0 (CI: 2.13, 7.14). CONCLUSION We have provided preliminary evidence showing that the SL preprocessing may provide benefits in comparison with accepted approaches. The work will require further evaluation with varying datasets to confirm these findings.
منابع مشابه
Prediction of Breast Tumor Malignancy Using Neural Network and Whale Optimization Algorithms (WOA)
Introduction: Breast cancer is the most prevalent cause of cancer mortality among women. Early diagnosis of breast cancer gives patients greater survival time. The present study aims to provide an algorithm for more accurate prediction and more effective decision-making in the treatment of patients with breast cancer. Methods: The present study was applied, descriptive-analytical, based on the ...
متن کاملSurvival and Factors Affecting it in Lung Cancer Patients Referred to Imam Khomeini Clinic in Hamadan Province
Background: Lung cancer is one of the most common cancers and the leading cause of death due to cancer in the world. It has the highest mortality rate compared to breast, prostate, and other cancers. Different factors can be effective in the survival of lung cancer patients. The present study has evaluated survival and its related factors. Materials and Methods: The present study was performed...
متن کاملBehavioral Analysis of Traffic Flow for an Effective Network Traffic Identification
Fast and accurate network traffic identification is becoming essential for network management, high quality of service control and early detection of network traffic abnormalities. Techniques based on statistical features of packet flows have recently become popular for network classification due to the limitations of traditional port and payload based methods. In this paper, we propose a metho...
متن کاملThe Effect of Time-dependent Prognostic Factors on Survival of Non-Small Cell Lung Cancer using Bayesian Extended Cox Model
Abstract Background: Lung cancer is one of the most common cancers around the world. The aim of this study was to use Extended Cox Model (ECM) with Bayesian approach to survey the behavior of potential time-varying prognostic factors of Non-small cell lung cancer. Materials and Methods: Survival status of all 190 patients diagnosed with Non-Small Cell lung cancer referring to hospitals in ...
متن کاملThe prediction of lymphedema via the combination of the selected data mining algorithms
Background: Breast cancer is the second leading cause of cancer death in women, after lung cancer. Due to the importance of predicting this disease, the use of data mining methods in medical research is more significant than before. Data mining algorithms can be a great help in preventing the development of lymphedema in patients. The aim Of this study was to create a diagnosis system that can ...
متن کامل